ZEPPELIN-3375: Make PySparkInterpreter extends PythonInterpreter#2919
ZEPPELIN-3375: Make PySparkInterpreter extends PythonInterpreter#2919zjffdu wants to merge 1 commit intoapache:masterfrom
Conversation
6116344 to
d6481ea
Compare
|
@felixcheung @Leemoonsoo Could you help review it ? Thanks |
| //TODO(zjffdu) don't do hard code on py4j here | ||
| File py4jDestFile = new File(pythonWorkDir, "py4j-src-0.9.2.zip"); | ||
| FileUtils.copyURLToFile(getClass().getClassLoader().getResource( | ||
| "python/py4j-src-0.9.2.zip"), py4jDestFile); |
There was a problem hiding this comment.
yeah, 2.3 is running with Py4J 0.10.6
There was a problem hiding this comment.
should this detect any mismatch here? check spark version or something?
There was a problem hiding this comment.
It is fine to use py4j 0.9.2 here for IPythonInterpreter, as for IPySparkInterpreter it would use the py4j of spark instead of py4j 0.9.2
There was a problem hiding this comment.
I wonder why Spark doesn't ship py4j zip file version-agnostic? Filed https://issues.apache.org/jira/browse/SPARK-23965
There was a problem hiding this comment.
I don't think this is a strong reason to rename or make a link for Spark's Py4J within Spark. Also, to be clear, I think It's an orthogonal issue with the current change here, if I am not mistaken.
| throw new IOException("Fail to run shell commands: " + StringUtils.join(commands, " ")); | ||
| } | ||
| logger.info("Complete shell commands: " + StringUtils.join(commands, " ")); | ||
| return outputGobbler.getOutput(); |
There was a problem hiding this comment.
I thought we had some launcher wrapper for something like this?
| try { | ||
| interpreter.close(); | ||
| } catch (InterpreterException e) { | ||
| LOGGER.warn("Fail to close interpreter: " + interpreter.getClassName()); |
There was a problem hiding this comment.
would the exception stack be useful?
just to LOGGER.warn( .... , e);?
|
|
||
| jsc = intp.getJavaSparkContext() | ||
|
|
||
| if sparkVersion.isImportAllPackageUnderSparkSql(): |
There was a problem hiding this comment.
It is only used when spark version is lower than 1.3. There's many code in zeppelin that is for specific old spark version. I don't think we need them, actually we have no test for them, no one know whether they works or note. I think it is time to zeppelin drop support for old version of spark. But it require more work, will do it in another PR for this.
| try { | ||
| bootstrapInterpreter("python/zeppelin_pyspark.py"); | ||
| } catch (IOException e) { | ||
| e.printStackTrace(); |
| except: | ||
| raise Exception(traceback.format_exc()) | ||
| if not isForCompletion: | ||
| exception = traceback.format_exc() |
There was a problem hiding this comment.
add a comment what this is looking for and what it looks like typically?
| gateway = JavaGateway(client, auto_convert = True) | ||
| intp = gateway.entry_point | ||
| # redirect stdout/stderr to java side so that PythonInterpreter can capture the python execution result | ||
| output = Logger() |
There was a problem hiding this comment.
there are problems reported with these names too common, conflict with existing variables, output, gateway, client etc, we should name these uniquely if we could - even if "temporary" since in python variables have global scope
There was a problem hiding this comment.
It if fine to use them, because they are not in the same namespace of user code (they are not visible to users).
There was a problem hiding this comment.
I'm not sure - this sets globally right? for example z is accessible from user code.
There was a problem hiding this comment.
User code is in namespace _zcUserQueryNameSpace instead of global namespace https://github.com/zjffdu/zeppelin/blob/ZEPPELIN-3375/python/src/main/resources/python/zeppelin_python.py#L91
| return None | ||
| else: | ||
| objectDefList = execResult['objectDefList'] | ||
| return [completion for completion in execResult['objectDefList'] if completion.startswith(methodName)] |
There was a problem hiding this comment.
startswith -I think a lot times partial match - not necessarily from the beginning, can be good?
There was a problem hiding this comment.
Might be, there's a lot work to do for improving the code completion. This PR is large, I don't want to cover too much thing in this single PR.
91ef36d to
4d30a5e
Compare
|
Will merge if no more comment |
### What is this PR for? This PR is trying to remove the code duplication between PySparkInterpreter and PythonInterpreter. So here's the main things this PR did: * PySparkInterpreter extends PythonInterpreter * PySparkInterpreterTest extends PythonInterpreterTest so that we can verify PySparkInterpreter can do whatever PythonInterpreter can do * Move interpreter/lib/python/backend_zinline.py and interpreter/lib/python/mpl_config.py into python module, so that python module can ship these resources together. ### What type of PR is it? [ Improvement | Refactoring] ### Todos * [ ] - Task ### What is the Jira issue? * https://issues.apache.org/jira/browse/ZEPPELIN-3375 ### How should this be tested? * CI pass ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Jeff Zhang <zjffdu@apache.org> Closes apache#2919 from zjffdu/ZEPPELIN-3375 and squashes the following commits: 738c6c5 [Jeff Zhang] ZEPPELIN-3375. Make PySparkInterpreter extends PythonInterpreter
This PR is trying to remove the code duplication between PySparkInterpreter and PythonInterpreter. So here's the main things this PR did: * PySparkInterpreter extends PythonInterpreter * PySparkInterpreterTest extends PythonInterpreterTest so that we can verify PySparkInterpreter can do whatever PythonInterpreter can do * Move interpreter/lib/python/backend_zinline.py and interpreter/lib/python/mpl_config.py into python module, so that python module can ship these resources together. [ Improvement | Refactoring] * [ ] - Task * https://issues.apache.org/jira/browse/ZEPPELIN-3375 * CI pass * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Jeff Zhang <zjffdu@apache.org> Closes apache#2919 from zjffdu/ZEPPELIN-3375 and squashes the following commits: 738c6c5 [Jeff Zhang] ZEPPELIN-3375. Make PySparkInterpreter extends PythonInterpreter (cherry picked from commit 0a97446)
This PR is trying to remove the code duplication between PySparkInterpreter and PythonInterpreter. So here's the main things this PR did: * PySparkInterpreter extends PythonInterpreter * PySparkInterpreterTest extends PythonInterpreterTest so that we can verify PySparkInterpreter can do whatever PythonInterpreter can do * Move interpreter/lib/python/backend_zinline.py and interpreter/lib/python/mpl_config.py into python module, so that python module can ship these resources together. [ Improvement | Refactoring] * [ ] - Task * https://issues.apache.org/jira/browse/ZEPPELIN-3375 * CI pass * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Jeff Zhang <zjffdu@apache.org> Closes apache#2919 from zjffdu/ZEPPELIN-3375 and squashes the following commits: 738c6c5 [Jeff Zhang] ZEPPELIN-3375. Make PySparkInterpreter extends PythonInterpreter (cherry picked from commit 0a97446)
This PR is trying to remove the code duplication between PySparkInterpreter and PythonInterpreter. So here's the main things this PR did: * PySparkInterpreter extends PythonInterpreter * PySparkInterpreterTest extends PythonInterpreterTest so that we can verify PySparkInterpreter can do whatever PythonInterpreter can do * Move interpreter/lib/python/backend_zinline.py and interpreter/lib/python/mpl_config.py into python module, so that python module can ship these resources together. [ Improvement | Refactoring] * [ ] - Task * https://issues.apache.org/jira/browse/ZEPPELIN-3375 * CI pass * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Jeff Zhang <zjffdu@apache.org> Closes apache#2919 from zjffdu/ZEPPELIN-3375 and squashes the following commits: 738c6c5 [Jeff Zhang] ZEPPELIN-3375. Make PySparkInterpreter extends PythonInterpreter (cherry picked from commit 0a97446)
This PR is trying to remove the code duplication between PySparkInterpreter and PythonInterpreter. So here's the main things this PR did: * PySparkInterpreter extends PythonInterpreter * PySparkInterpreterTest extends PythonInterpreterTest so that we can verify PySparkInterpreter can do whatever PythonInterpreter can do * Move interpreter/lib/python/backend_zinline.py and interpreter/lib/python/mpl_config.py into python module, so that python module can ship these resources together. [ Improvement | Refactoring] * [ ] - Task * https://issues.apache.org/jira/browse/ZEPPELIN-3375 * CI pass * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Jeff Zhang <zjffdu@apache.org> Closes #2919 from zjffdu/ZEPPELIN-3375 and squashes the following commits: 738c6c5 [Jeff Zhang] ZEPPELIN-3375. Make PySparkInterpreter extends PythonInterpreter (cherry picked from commit 0a97446) (cherry picked from commit 595d45b)
What is this PR for?
This PR is trying to remove the code duplication between PySparkInterpreter and PythonInterpreter. So here's the main things this PR did:
What type of PR is it?
[ Improvement | Refactoring]
Todos
What is the Jira issue?
How should this be tested?
Screenshots (if appropriate)
Questions: